Data integration

Here we integrate the scRNA of the mouse frontal cortex (Dropviz dataset) and a spatial transcriptomics dataset of the same region (STARmap).

knitr::opts_chunk$set(echo = TRUE)
options(rgl.useNULL=TRUE)
library(rgl)
library(ggplot2)
library(knitr)
library(rglwidget)
library(viridis)
install.packages('devtools')
library(devtools)
install_github('welch-lab/liger')
library(rliger)
library(dplyr)
knit_hooks$set(webgl = hook_webgl)

Step 1: Download the data

First, read in your datasets. For this tutorial, we will use two matrices, which can all be downloaded at https://www.dropbox.com/sh/y9kjoum8u469nj1/AADik2b2-Qo3os2QSWXdIAbna?dl=0 . The scRNA mouse dataset (Dropviz_mouse_staranalysis.RDS) is 28,366 genes by 71,639 cells. The spatial transcriptomic STARmap dataset (STARmap.RDS) is 28 genes by 31,294 cells.

dropviz = readRDS("Dropviz_starmap_vig.RDS")
starmap = readRDS("STARmap_vig.RDS")

Step 2: Preprocessing and normalization

Next, create your Liger object, submitting the datasets in list format. The unshared features should not be subsetted out, or submitted separately. Rather, they should be included in the matrix submitted for that dataset. This helps ensure proper normalization.

liger <- createLiger(list(starmap = starmap,dropviz = dropviz))

Normalize the datasets. The normalization is applied to the datasets in their entirety.

liger <- normalize(liger)

Select shared, homologous genes between the two species, as well as unshared, non-homologous genes from the lizard dataset. We also want to make sure that we use all 28 of the STARmap genes, so we manually set the variable genes shared between datasets.

liger <- selectGenes(liger, var.thres= 0.3,unshared = TRUE, unshared.datasets = list(2), unshared.thresh= 0.3)
## [1] "Selected 2775 unshared features from dropviz Dataset"
liger@var.genes = rownames(liger@raw.data$star)

Next, scale, but do not center, the data.

liger <- scaleNotCenter(liger)

Step 3: Joint Matrix Factorization

To factorize the datasets including unshared datasets, set the use.unshared parameter to TRUE. We also use a vectorized lambda by setting the vectorized.lambda parameter to TRUE, and providing the desired lambda values as a list, such that the first provided lambda is the penalty for the first dataset, and the second lambda penalty (1) is applied to the second dataset. Another noteworthy advantage of UINMF is that we are able to use a larger number of factors than there are shared features. We captilize on this by changing the default value of k to 40; however, it should be noted that the default value of k = 30 works effectively as well (Please see UINMF publication for further details: https://www.biorxiv.org/content/10.1101/2021.04.09.439160v1.full )

liger <- optimizeALS(liger,  lambda = list(10,1) , use.unshared = TRUE, max.iters = 30, thresh=1e-10, k =40, vectorized.lambda = TRUE)
## [1] "Performing Factorization using UINMF and unshared features"
## [1] "Processing"
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Current seed  1  current objective  179906197
## Best results with seed 1.

Step 4: Quantile Normalization and Joint Clustering

After factorization, the resulting Liger object can used in all downstream LIGER functions without adjustment. The default reference dataset for quantile normalization is the larger dataset, but the user should select the higher quality dataset as the reference dataset, even if it is the smaller dataset. In this case, the Dropviz dataset is considered higher quality than the STARmap dataset, so we set the Dropviz dataset to be the reference dataset.

liger <- quantile_norm(liger, ref_dataset = "dropviz")
liger <- louvainCluster(liger)
## Louvain Clustering on quantile normalized cell factor loadings.

Step 5: Visualizations and Downstream processing

liger <- runUMAP(liger, n_neighbors = 35)

Next, we can visualize our returned factorized object by dataset to check the alignment between datasets.

umap_plots <-plotByDatasetAndCluster(liger, axis.labels = c("UMAP1","UMAP2"), return.plots = TRUE)
umap_plots[[1]]

umap_plots[[2]] 


We can also use the Dropviz labels to help us annotate the clusters.

mouse_annies = readRDS("Dropviz_general_annotations.RDS")
umap_plots <-plotByDatasetAndCluster(liger, axis.labels = c("UMAP1","UMAP2"), return.plots = TRUE, clusters = mouse_annies)
umap_plots[[2]] 


# Step 6: Annotations and visualization within a 3D space Here we demonstrate how to use the annotation labels derived from the above analysis within the context of 3D space. We provide the annotation labels for the sake of simplicity. These labels were generated using the high quality Dropviz annotationsto re-annotate the STARmap cells after completing the above analysis. We have provided the exact annotations and colors used in the publication such that the interested user may captilize on the 3D dimensional sample space.

coords_annotations_3d <- readRDS("STARmap_3D_Annotations.RDS")

Here we graph all of the cells:

rgl.viewpoint(theta = 25, phi = 5, zoom = 0.7)
plot3d(x=coords_annotations_3d$Coord1, y=coords_annotations_3d$Coord2, z=coords_annotations_3d$Coord3, xlab="", ylab = "", zlab ="",col = coords_annotations_3d$Cell_Type_Color, size = 2, axes= FALSE, labels = FALSE)
aspect3d(1.7,1.4,0.1)
first = c(0)
second = c(0)
bg3d("black", labels = FALSE)
axes3d(edges = "bbox",col='white', labels=FALSE, tick=FALSE)

Graph just the neuronal cells

neuronal_cells <- filter(coords_annotations_3d , coords_annotations_3d$General_Class == "Neuron")
plot3d(x=neuronal_cells$Coord1, y=neuronal_cells$Coord2, z=neuronal_cells$Coord3, xlab="", ylab = "", zlab ="", col = neuronal_cells$Sub_Color, size = 3, axes= FALSE, labels = FALSE)
aspect3d(1.7,1.4,0.1)
first = c(0)
second = c(0)
bg3d("black", labels = FALSE)
axes3d(edges = "bbox",col='white', labels=FALSE, tick=FALSE)

Graph just the interneurons

interneuron_cells <- filter(coords_annotations_3d , coords_annotations_3d$General_Class == "Interneuron")
plot3d(x=interneuron_cells$Coord1, y=interneuron_cells$Coord2, z=interneuron_cells$Coord3, xlab="", ylab = "", zlab ="", col = interneuron_cells$Sub_Color, size = 3, axes= FALSE, labels = FALSE)
aspect3d(1.7,1.4,0.1)
first = c(0)
second = c(0)
bg3d("black", labels = FALSE)
axes3d(edges = "bbox",col='white', labels=FALSE, tick=FALSE)

Graph just the oligodendrocytes and polydendrocytes

oligo_poly_cells <- filter(coords_annotations_3d , coords_annotations_3d$General_Class == "Oligo_Poly")
plot3d(x=oligo_poly_cells$Coord1, y=oligo_poly_cells$Coord2, z=oligo_poly_cells$Coord3, xlab="", ylab = "", zlab ="", col = oligo_poly_cells$Sub_Color, size = 3, axes= FALSE, labels = FALSE)
aspect3d(1.7,1.4,0.1)
first = c(0)
second = c(0)
bg3d("black", labels = FALSE)
axes3d(edges = "bbox",col='white', labels=FALSE, tick=FALSE)

Graph only the endothelial cells

endo_cells <- filter(coords_annotations_3d , coords_annotations_3d$General_Class == "Endothelial")
plot3d(x=endo_cells$Coord1, y=endo_cells$Coord2, z=endo_cells$Coord3, xlab="", ylab = "", zlab ="", col = endo_cells$Sub_Color, size = 3, axes= FALSE, labels = FALSE)
aspect3d(1.7,1.4,0.1)
first = c(0)
second = c(0)
bg3d("black", labels = FALSE)
axes3d(edges = "bbox",col='white', labels=FALSE, tick=FALSE)